Skip to content

Enhanced Ingestion with Named Graph Restrictions #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

tekrajchhetri
Copy link
Collaborator

@tekrajchhetri tekrajchhetri commented Feb 21, 2025

Enhanced Ingestion with Named Graph Restrictions

Features Implemented

  • JSON-LD Ingestion: Supports ingestion of single or multiple JSON-LD files.
  • Turtle Ingestion: Enables ingestion of single or multiple Turtle files.
  • Restricted Ingestion: Ensures that data is only ingested into pre-registered named graphs, preventing unintended ingestion of arbitrary data.
  • PROV Metadata Tracking:
    • Captures provenance data for each ingestion activity.
    • Records metadata about the named graphs.
    • Uses the PROV ontology and DCTERMS (Dublin Core) for recording metadata and provenance.

):
if not files:
raise HTTPException(status_code=400, detail="No files provided")
named_graph_iri = named_graph_exists(graph)
if named_graph_iri["status"] != True:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean if the status is not True? Should the HTTP response back not be a 200?

@@ -378,7 +414,8 @@ async def ingest_knowledge_graphs_batch(


@router.post("/upload/document", summary="Ingest a either TXT, JSON and PDF files",
dependencies=[Depends(require_scopes(["write"]))]
dependencies=[Depends(require_scopes(["write"]))],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to include the LoggedIn annotation kwarg that you've been including elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I want to include it.

"user": posting_user,
"filename": file.filename,
"extension": file.filename.split('.')[-1].lower()
})



@router.post("/upload/documents",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

try:
response = requests.get(endpoint)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests is a synchronous call -- is this function eventually being called in an upstream async call?

Comment on lines +178 to +181
return {
"status": "error",
"message": f"Error connecting to query service: {str(e)}"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an HTTP status code indicating it is an error returned as well?


logger = logging.getLogger(__name__)

async def background_task():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be apart of the @app.on_event("startup")?

- RABBITMQ_URL, i.e., the hostname, by default it is localhost
- RABBITMQ_PORT, by default 5672 is used
- RABBITMQ_VHOST, default vhost is "/"
-

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add something here regarding the ingestion URL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants